Import Libraries

1) Prepare data for Analysis, Treat Missing Data, Wrong Data, Outlier Data

Read the data

Outliers

2) Create new/derived predictors (e.g. Age group) for analysis

OHE of categorical columns

Standardization of con columns

3) Explore the Data using Exploratory Data Analysis - For Y and all Xs

4) Explore the Data using Exploratory Data Analysis - For pairs of Y and all Xs

5) Visualize all the distribution, relationships

6) Perform Test of Hypothesis - Compare rates for same level male and female, check relationship between categorical variables like Age and Gender/Gender and Education Field, Age and Income, etc

Cross tabulation

7) Perform Regression treating Monthly Rate as Y, and choose prediction error and the best model

Multicolinearity

Divide data in training & testing set

Model Building

Linear Regression

Lasso Regression

Ridge Regression

KNeighborsRegressor

Decision Tree

8) Form Classification Model using Y=attrition and choose the best model

Divide data in training & testing set

Model Building

Logistic Regression

KNN

Baysian

Decision Tree

Hyperparameter tuning

Plot decision tree after hyperparameter tuning

SVM

ANN

Final Predictions

Logistic regression model gives high accuracy, so selected as best model.

9) Clustering - Find intersting clusters using K-means, Heirarchical and DBSCAN clustering. Connect to Domain scenario and its usefulness in analysis (Ignore Attrition Column)

a) Grouping of Emploees

Reducing the dimensionality of the Data

K-Means Clustering

Elbow method

The Elbow or or the optimal value is at 5.

Grouping of employees

Heirarchical clustering

DBSCAN Clustering